Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells446431
Missing cells (%)8.3%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Age has 97 (21.7%) missing values Age has 90 (20.2%) missing values Missing
Cabin has 348 (78.0%) missing values Cabin has 340 (76.2%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 299 (67.0%) zeros SibSp has 315 (70.6%) zeros Zeros
Parch has 342 (76.7%) zeros Parch has 327 (73.3%) zeros Zeros
Fare has 10 (2.2%) zeros Fare has 8 (1.8%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-07-15 20:42:40.6100962024-07-15 20:42:44.823775
Analysis finished2024-07-15 20:42:44.8226292024-07-15 20:42:49.043833
Duration4.21 seconds4.22 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean441.5460.98206
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:49.226881image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile42.2562.25
Q1226.25230.25
median426.5472
Q3673.5684.75
95-th percentile843.5854.75
Maximum891891
Range890890
Interquartile range (IQR)447.25454.5

Descriptive statistics

 Dataset ADataset B
Standard deviation258.85613256.75278
Coefficient of variation (CV)0.586310610.55696914
Kurtosis-1.2048005-1.2100875
Mean441.5460.98206
Median Absolute Deviation (MAD)222.5224
Skewness0.016159286-0.06852953
Sum196909205598
Variance67006.49865921.991
MonotonicityNot monotonicNot monotonic
2024-07-15T20:42:49.497570image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
65 1
 
0.2%
226 1
 
0.2%
683 1
 
0.2%
328 1
 
0.2%
190 1
 
0.2%
611 1
 
0.2%
507 1
 
0.2%
234 1
 
0.2%
136 1
 
0.2%
201 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
540 1
 
0.2%
108 1
 
0.2%
500 1
 
0.2%
669 1
 
0.2%
201 1
 
0.2%
819 1
 
0.2%
394 1
 
0.2%
604 1
 
0.2%
229 1
 
0.2%
297 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
10 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
26 1
0.2%
28 1
0.2%
30 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
10 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
26 1
0.2%
28 1
0.2%
30 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
272 
1
174 
0
268 
1
178 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row11
3rd row00
4th row11
5th row00

Common Values

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Length

2024-07-15T20:42:49.826811image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T20:42:49.974582image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:50.112615image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring characters

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
246 
2
102 
1
98 
3
246 
1
110 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row31
3rd row12
4th row31
5th row23

Common Values

ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Length

2024-07-15T20:42:50.263830image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T20:42:50.412986image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:50.565778image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 246
55.2%
2 102
22.9%
1 98
 
22.0%
ValueCountFrequency (%)
3 246
55.2%
1 110
24.7%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:50.973143image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5150
Mean length26.7914826.94843
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1194912019
Distinct characters5959
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowStewart, Mr. Albert AFrolicher, Miss. Hedwig Margaritha
2nd rowNiskanen, Mr. JuhaSwift, Mrs. Frederick Joel (Margaret Welles Barron)
3rd rowIsham, Miss. Ann ElizabethHodges, Mr. Henry Price
4th rowLandergren, Miss. Aurora AdeliaDuff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")
5th rowHold, Mr. StephenGarfirth, Mr. John
ValueCountFrequency (%)
mr 262
 
14.5%
miss 84
 
4.6%
mrs 61
 
3.4%
william 36
 
2.0%
master 26
 
1.4%
john 23
 
1.3%
henry 21
 
1.2%
charles 15
 
0.8%
thomas 14
 
0.8%
george 11
 
0.6%
Other values (906) 1258
69.5%
ValueCountFrequency (%)
mr 263
 
14.5%
miss 88
 
4.8%
mrs 67
 
3.7%
william 28
 
1.5%
john 21
 
1.2%
master 20
 
1.1%
henry 17
 
0.9%
charles 14
 
0.8%
george 13
 
0.7%
thomas 11
 
0.6%
Other values (907) 1273
70.1%
2024-07-15T20:42:51.606511image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1366
 
11.4%
r 967
 
8.1%
e 846
 
7.1%
a 803
 
6.7%
s 658
 
5.5%
n 653
 
5.5%
i 643
 
5.4%
M 544
 
4.6%
l 528
 
4.4%
o 522
 
4.4%
Other values (49) 4419
37.0%
ValueCountFrequency (%)
1371
 
11.4%
r 1004
 
8.4%
e 863
 
7.2%
a 806
 
6.7%
i 671
 
5.6%
s 659
 
5.5%
n 634
 
5.3%
M 558
 
4.6%
l 537
 
4.5%
o 516
 
4.3%
Other values (49) 4400
36.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11949
100.0%
ValueCountFrequency (%)
(unknown) 12019
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 967
 
8.1%
e 846
 
7.1%
a 803
 
6.7%
s 658
 
5.5%
n 653
 
5.5%
i 643
 
5.4%
M 544
 
4.6%
l 528
 
4.4%
o 522
 
4.4%
Other values (49) 4419
37.0%
ValueCountFrequency (%)
1371
 
11.4%
r 1004
 
8.4%
e 863
 
7.2%
a 806
 
6.7%
i 671
 
5.6%
s 659
 
5.5%
n 634
 
5.3%
M 558
 
4.6%
l 537
 
4.5%
o 516
 
4.3%
Other values (49) 4400
36.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11949
100.0%
ValueCountFrequency (%)
(unknown) 12019
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 967
 
8.1%
e 846
 
7.1%
a 803
 
6.7%
s 658
 
5.5%
n 653
 
5.5%
i 643
 
5.4%
M 544
 
4.6%
l 528
 
4.4%
o 522
 
4.4%
Other values (49) 4419
37.0%
ValueCountFrequency (%)
1371
 
11.4%
r 1004
 
8.4%
e 863
 
7.2%
a 806
 
6.7%
i 671
 
5.6%
s 659
 
5.5%
n 634
 
5.3%
M 558
 
4.6%
l 537
 
4.5%
o 516
 
4.3%
Other values (49) 4400
36.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11949
100.0%
ValueCountFrequency (%)
(unknown) 12019
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1366
 
11.4%
r 967
 
8.1%
e 846
 
7.1%
a 803
 
6.7%
s 658
 
5.5%
n 653
 
5.5%
i 643
 
5.4%
M 544
 
4.6%
l 528
 
4.4%
o 522
 
4.4%
Other values (49) 4419
37.0%
ValueCountFrequency (%)
1371
 
11.4%
r 1004
 
8.4%
e 863
 
7.2%
a 806
 
6.7%
i 671
 
5.6%
s 659
 
5.5%
n 634
 
5.3%
M 558
 
4.6%
l 537
 
4.5%
o 516
 
4.3%
Other values (49) 4400
36.6%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
298 
female
148 
male
289 
female
157 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.66367714.7040359
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20802098
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalefemale
3rd rowfemalemale
4th rowfemalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 298
66.8%
female 148
33.2%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Length

2024-07-15T20:42:51.771363image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T20:42:51.894312image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:51.995623image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
male 298
66.8%
female 148
33.2%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Most occurring characters

ValueCountFrequency (%)
e 594
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 148
 
7.1%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2080
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 594
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 148
 
7.1%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2080
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 594
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 148
 
7.1%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2080
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 594
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 148
 
7.1%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7469
Distinct (%)21.2%19.4%
Missing9790
Missing (%)21.7%20.2%
Infinite00
Infinite (%)0.0%0.0%
Mean29.08644729.0075
 Dataset ADataset B
Minimum0.420.75
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:52.155660image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.75
5-th percentile44
Q12119
median2827
Q33738
95-th percentile53.654.25
Maximum8080
Range79.5879.25
Interquartile range (IQR)1619

Descriptive statistics

 Dataset ADataset B
Standard deviation14.16687714.131827
Coefficient of variation (CV)0.487061120.48717838
Kurtosis0.555585070.14541107
Mean29.08644729.0075
Median Absolute Deviation (MAD)89
Skewness0.43385160.38466988
Sum10151.1710326.67
Variance200.70041199.70853
MonotonicityNot monotonicNot monotonic
2024-07-15T20:42:52.367534image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 19
 
4.3%
22 15
 
3.4%
30 14
 
3.1%
18 13
 
2.9%
28 12
 
2.7%
36 11
 
2.5%
25 11
 
2.5%
32 10
 
2.2%
23 10
 
2.2%
29 10
 
2.2%
Other values (64) 224
50.2%
(Missing) 97
21.7%
ValueCountFrequency (%)
18 16
 
3.6%
24 15
 
3.4%
25 14
 
3.1%
22 13
 
2.9%
21 13
 
2.9%
19 13
 
2.9%
26 11
 
2.5%
36 11
 
2.5%
28 10
 
2.2%
34 10
 
2.2%
Other values (59) 230
51.6%
(Missing) 90
 
20.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 4
0.9%
4 6
1.3%
5 3
0.7%
6 3
0.7%
7 1
 
0.2%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 4
0.9%
4 7
1.6%
5 3
0.7%
6 3
0.7%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 4
0.9%
4 7
1.6%
5 3
0.7%
6 3
0.7%
8 2
 
0.4%
9 4
0.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 4
0.9%
4 6
1.3%
5 3
0.7%
6 3
0.7%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.562780270.47309417
 Dataset ADataset B
Minimum00
Maximum88
Zeros299315
Zeros (%)67.0%70.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:52.551981image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.19713341.0113703
Coefficient of variation (CV)2.12717732.1377781
Kurtosis17.28452717.054265
Mean0.562780270.47309417
Median Absolute Deviation (MAD)00
Skewness3.72714733.5561665
Sum251211
Variance1.43312841.02287
MonotonicityNot monotonicNot monotonic
2024-07-15T20:42:52.722195image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 15
 
3.4%
4 8
 
1.8%
3 7
 
1.6%
8 5
 
1.1%
5 4
 
0.9%
ValueCountFrequency (%)
0 315
70.6%
1 97
 
21.7%
2 12
 
2.7%
4 10
 
2.2%
3 8
 
1.8%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 15
 
3.4%
3 7
 
1.6%
4 8
 
1.8%
5 4
 
0.9%
8 5
 
1.1%
ValueCountFrequency (%)
0 315
70.6%
1 97
 
21.7%
2 12
 
2.7%
3 8
 
1.8%
4 10
 
2.2%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 315
70.6%
1 97
 
21.7%
2 12
 
2.7%
3 8
 
1.8%
4 10
 
2.2%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 299
67.0%
1 108
 
24.2%
2 15
 
3.4%
3 7
 
1.6%
4 8
 
1.8%
5 4
 
0.9%
8 5
 
1.1%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.358744390.41704036
 Dataset ADataset B
Minimum00
Maximum55
Zeros342327
Zeros (%)76.7%73.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:52.888194image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.74743140.80508286
Coefficient of variation (CV)2.0834651.9304675
Kurtosis8.03013057.2594478
Mean0.358744390.41704036
Median Absolute Deviation (MAD)00
Skewness2.51842732.3739713
Sum160186
Variance0.55865370.64815841
MonotonicityNot monotonicNot monotonic
2024-07-15T20:42:53.057399image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 342
76.7%
1 58
 
13.0%
2 41
 
9.2%
3 2
 
0.4%
5 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 327
73.3%
1 65
 
14.6%
2 48
 
10.8%
5 3
 
0.7%
3 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 342
76.7%
1 58
 
13.0%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 327
73.3%
1 65
 
14.6%
2 48
 
10.8%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 327
73.3%
1 65
 
14.6%
2 48
 
10.8%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 342
76.7%
1 58
 
13.0%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct385388
Distinct (%)86.3%87.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:53.639880image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.7937226.8475336
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters30303054
Distinct characters3235
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique339346 ?
Unique (%)76.0%77.6%

Sample

 Dataset ADataset B
1st rowPC 1760513568
2nd rowSTON/O 2. 310128917466
3rd rowPC 17595250643
4th rowC 707711755
5th row26707358585
ValueCountFrequency (%)
pc 32
 
5.5%
c.a 18
 
3.1%
ca 9
 
1.6%
a/5 9
 
1.6%
ston/o 8
 
1.4%
2 8
 
1.4%
c 5
 
0.9%
2343 5
 
0.9%
1601 4
 
0.7%
2144 4
 
0.7%
Other values (403) 475
82.3%
ValueCountFrequency (%)
pc 25
 
4.4%
c.a 16
 
2.8%
a/5 10
 
1.8%
w./c 8
 
1.4%
ston/o 7
 
1.2%
2 7
 
1.2%
347082 5
 
0.9%
sc/paris 5
 
0.9%
ca 4
 
0.7%
soton/o.q 4
 
0.7%
Other values (410) 480
84.1%
2024-07-15T20:42:54.606427image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 374
12.3%
1 331
10.9%
2 301
9.9%
7 262
8.6%
4 230
 
7.6%
6 208
 
6.9%
5 198
 
6.5%
0 196
 
6.5%
9 160
 
5.3%
8 142
 
4.7%
Other values (22) 628
20.7%
ValueCountFrequency (%)
3 378
12.4%
1 355
11.6%
2 287
9.4%
7 241
 
7.9%
4 230
 
7.5%
0 215
 
7.0%
6 211
 
6.9%
5 196
 
6.4%
9 165
 
5.4%
8 131
 
4.3%
Other values (25) 645
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3030
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 374
12.3%
1 331
10.9%
2 301
9.9%
7 262
8.6%
4 230
 
7.6%
6 208
 
6.9%
5 198
 
6.5%
0 196
 
6.5%
9 160
 
5.3%
8 142
 
4.7%
Other values (22) 628
20.7%
ValueCountFrequency (%)
3 378
12.4%
1 355
11.6%
2 287
9.4%
7 241
 
7.9%
4 230
 
7.5%
0 215
 
7.0%
6 211
 
6.9%
5 196
 
6.4%
9 165
 
5.4%
8 131
 
4.3%
Other values (25) 645
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3030
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 374
12.3%
1 331
10.9%
2 301
9.9%
7 262
8.6%
4 230
 
7.6%
6 208
 
6.9%
5 198
 
6.5%
0 196
 
6.5%
9 160
 
5.3%
8 142
 
4.7%
Other values (22) 628
20.7%
ValueCountFrequency (%)
3 378
12.4%
1 355
11.6%
2 287
9.4%
7 241
 
7.9%
4 230
 
7.5%
0 215
 
7.0%
6 211
 
6.9%
5 196
 
6.4%
9 165
 
5.4%
8 131
 
4.3%
Other values (25) 645
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3030
100.0%
ValueCountFrequency (%)
(unknown) 3054
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 374
12.3%
1 331
10.9%
2 301
9.9%
7 262
8.6%
4 230
 
7.6%
6 208
 
6.9%
5 198
 
6.5%
0 196
 
6.5%
9 160
 
5.3%
8 142
 
4.7%
Other values (22) 628
20.7%
ValueCountFrequency (%)
3 378
12.4%
1 355
11.6%
2 287
9.4%
7 241
 
7.9%
4 230
 
7.5%
0 215
 
7.0%
6 211
 
6.9%
5 196
 
6.4%
9 165
 
5.4%
8 131
 
4.3%
Other values (25) 645
21.1%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182184
Distinct (%)40.8%41.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.93636834.230558
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros108
Zeros (%)2.2%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:54.866817image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1625257.15
Q17.89587.9031
median14.2541513.64585
Q330.531.3875
95-th percentile110.8833120
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.604223.4844

Descriptive statistics

 Dataset ADataset B
Standard deviation50.11291258.440025
Coefficient of variation (CV)1.56914881.7072472
Kurtosis39.85080931.755854
Mean31.93636834.230558
Median Absolute Deviation (MAD)6.745856.39585
Skewness5.21123074.9139083
Sum14243.6215266.829
Variance2511.3043415.2365
MonotonicityNot monotonicNot monotonic
2024-07-15T20:42:55.148263image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 23
 
5.2%
8.05 21
 
4.7%
7.8958 19
 
4.3%
26 16
 
3.6%
7.925 12
 
2.7%
7.775 11
 
2.5%
7.75 11
 
2.5%
10.5 11
 
2.5%
0 10
 
2.2%
7.25 8
 
1.8%
Other values (172) 304
68.2%
ValueCountFrequency (%)
8.05 24
 
5.4%
13 19
 
4.3%
7.8958 17
 
3.8%
26 14
 
3.1%
10.5 14
 
3.1%
7.75 13
 
2.9%
7.925 12
 
2.7%
7.2292 9
 
2.0%
7.775 9
 
2.0%
0 8
 
1.8%
Other values (174) 307
68.8%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.125 2
 
0.4%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 8
1.8%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.125 2
 
0.4%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8887
Distinct (%)89.8%82.1%
Missing348340
Missing (%)78.0%76.2%
Memory size7.0 KiB7.0 KiB
2024-07-15T20:42:55.707289image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.48979593.7358491
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters342396
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7870 ?
Unique (%)79.6%66.0%

Sample

 Dataset ADataset B
1st rowC49B39
2nd rowA31D17
3rd rowE50A16
4th rowC62 C64B3
5th rowA5D26
ValueCountFrequency (%)
d 2
 
1.8%
f4 2
 
1.8%
c92 2
 
1.8%
e67 2
 
1.8%
b98 2
 
1.8%
b96 2
 
1.8%
c68 2
 
1.8%
e24 2
 
1.8%
c65 2
 
1.8%
c124 2
 
1.8%
Other values (90) 91
82.0%
ValueCountFrequency (%)
d 3
 
2.4%
e101 3
 
2.4%
c27 2
 
1.6%
b20 2
 
1.6%
g6 2
 
1.6%
c68 2
 
1.6%
f33 2
 
1.6%
b55 2
 
1.6%
b53 2
 
1.6%
b51 2
 
1.6%
Other values (89) 105
82.7%
2024-07-15T20:42:56.448098image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 39
11.4%
1 31
 
9.1%
B 30
 
8.8%
2 27
 
7.9%
6 27
 
7.9%
5 23
 
6.7%
3 22
 
6.4%
0 19
 
5.6%
4 18
 
5.3%
7 18
 
5.3%
Other values (8) 88
25.7%
ValueCountFrequency (%)
B 44
11.1%
1 39
 
9.8%
2 34
 
8.6%
3 34
 
8.6%
5 29
 
7.3%
C 29
 
7.3%
6 26
 
6.6%
8 23
 
5.8%
21
 
5.3%
E 18
 
4.5%
Other values (8) 99
25.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 342
100.0%
ValueCountFrequency (%)
(unknown) 396
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 39
11.4%
1 31
 
9.1%
B 30
 
8.8%
2 27
 
7.9%
6 27
 
7.9%
5 23
 
6.7%
3 22
 
6.4%
0 19
 
5.6%
4 18
 
5.3%
7 18
 
5.3%
Other values (8) 88
25.7%
ValueCountFrequency (%)
B 44
11.1%
1 39
 
9.8%
2 34
 
8.6%
3 34
 
8.6%
5 29
 
7.3%
C 29
 
7.3%
6 26
 
6.6%
8 23
 
5.8%
21
 
5.3%
E 18
 
4.5%
Other values (8) 99
25.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 342
100.0%
ValueCountFrequency (%)
(unknown) 396
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 39
11.4%
1 31
 
9.1%
B 30
 
8.8%
2 27
 
7.9%
6 27
 
7.9%
5 23
 
6.7%
3 22
 
6.4%
0 19
 
5.6%
4 18
 
5.3%
7 18
 
5.3%
Other values (8) 88
25.7%
ValueCountFrequency (%)
B 44
11.1%
1 39
 
9.8%
2 34
 
8.6%
3 34
 
8.6%
5 29
 
7.3%
C 29
 
7.3%
6 26
 
6.6%
8 23
 
5.8%
21
 
5.3%
E 18
 
4.5%
Other values (8) 99
25.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 342
100.0%
ValueCountFrequency (%)
(unknown) 396
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 39
11.4%
1 31
 
9.1%
B 30
 
8.8%
2 27
 
7.9%
6 27
 
7.9%
5 23
 
6.7%
3 22
 
6.4%
0 19
 
5.6%
4 18
 
5.3%
7 18
 
5.3%
Other values (8) 88
25.7%
ValueCountFrequency (%)
B 44
11.1%
1 39
 
9.8%
2 34
 
8.6%
3 34
 
8.6%
5 29
 
7.3%
C 29
 
7.3%
6 26
 
6.6%
8 23
 
5.8%
21
 
5.3%
E 18
 
4.5%
Other values (8) 99
25.0%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
325 
C
85 
Q
35 
S
331 
C
80 
Q
34 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowCC
2nd rowSS
3rd rowCS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 325
72.9%
C 85
 
19.1%
Q 35
 
7.8%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 331
74.2%
C 80
 
17.9%
Q 34
 
7.6%
(Missing) 1
 
0.2%

Length

2024-07-15T20:42:56.605361image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-07-15T20:42:56.715130image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:56.825219image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
s 325
73.0%
c 85
 
19.1%
q 35
 
7.9%
ValueCountFrequency (%)
s 331
74.4%
c 80
 
18.0%
q 34
 
7.6%

Most occurring characters

ValueCountFrequency (%)
S 325
73.0%
C 85
 
19.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 331
74.4%
C 80
 
18.0%
Q 34
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 85
 
19.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 331
74.4%
C 80
 
18.0%
Q 34
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 85
 
19.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 331
74.4%
C 80
 
18.0%
Q 34
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 85
 
19.1%
Q 35
 
7.9%
ValueCountFrequency (%)
S 331
74.4%
C 80
 
18.0%
Q 34
 
7.6%

Interactions

Dataset A

2024-07-15T20:42:43.598214image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.827000image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:40.876592image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.050147image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.526091image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.696732image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.188026image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.482581image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.959356image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.152168image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.715710image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.946248image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.000971image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.168254image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.649940image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.826686image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.313432image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.609701image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.078717image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.269206image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.852972image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:48.083876image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.139121image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.305945image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.795428image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.972057image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.552627image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.745244image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.215562image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.405236image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.993276image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:48.222768image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.280920image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.451478image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.925639image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.103832image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.696864image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.891228image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.354708image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.544771image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:44.115867image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:48.346081image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:41.401907image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:45.573772image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.055653image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:46.232872image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:42.827462image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.019574image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

2024-07-15T20:42:43.474391image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:47.703294image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Correlations

Dataset A

2024-07-15T20:42:56.910183image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset B

2024-07-15T20:42:57.047411image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.128-0.3270.0740.3070.000-0.1520.177
Embarked0.0001.0000.2240.0000.0000.2980.0940.0600.138
Fare0.1280.2241.0000.3900.0100.4740.1670.4540.252
Parch-0.3270.0000.3901.000-0.0090.0480.1950.4400.121
PassengerId0.0740.0000.010-0.0091.0000.0660.082-0.0600.000
Pclass0.3070.2980.4740.0480.0661.0000.1660.1760.327
Sex0.0000.0940.1670.1950.0820.1661.0000.0940.513
SibSp-0.1520.0600.4540.440-0.0600.1760.0941.0000.145
Survived0.1770.1380.2520.1210.0000.3270.5130.1451.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.065-0.2990.0350.2410.000-0.2300.205
Embarked0.0001.0000.2110.0560.1090.2460.1250.0920.132
Fare0.0650.2111.0000.461-0.0460.4730.1620.4240.278
Parch-0.2990.0560.4611.000-0.0150.0530.2320.4470.217
PassengerId0.0350.109-0.046-0.0151.0000.0000.069-0.0890.123
Pclass0.2410.2460.4730.0530.0001.0000.1630.1100.352
Sex0.0000.1250.1620.2320.0690.1631.0000.1800.601
SibSp-0.2300.0920.4240.447-0.0890.1100.1801.0000.151
Survived0.2050.1320.2780.2170.1230.3520.6010.1511.000

Missing values

Dataset A

2024-07-15T20:42:44.299333image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-07-15T20:42:48.527191image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-07-15T20:42:44.567804image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-07-15T20:42:48.793986image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-07-15T20:42:44.743190image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-07-15T20:42:48.963834image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
646501Stewart, Mr. Albert AmaleNaN00PC 1760527.7208NaNC
40040113Niskanen, Mr. Juhamale39.000STON/O 2. 31012897.9250NaNS
17717801Isham, Miss. Ann Elizabethfemale50.000PC 1759528.7125C49C
37637713Landergren, Miss. Aurora Adeliafemale22.000C 70777.2500NaNS
23623702Hold, Mr. Stephenmale44.0102670726.0000NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
83883913Chip, Mr. Changmale32.000160156.4958NaNS
35035103Odahl, Mr. Nils Martinmale23.00072679.2250NaNS
21221303Perkin, Mr. John Henrymale22.000A/5 211747.2500NaNS
37938003Gustafsson, Mr. Karl Gideonmale19.0003470697.7750NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
53954011Frolicher, Miss. Hedwig Margarithafemale22.0021356849.5000B39C
86286311Swift, Mrs. Frederick Joel (Margaret Welles Barron)female48.0001746625.9292D17S
72372402Hodges, Mr. Henry Pricemale50.00025064313.0000NaNS
55655711Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")female48.0101175539.6000A16C
76076103Garfirth, Mr. JohnmaleNaN0035858514.5000NaNS
42442503Rosblom, Mr. Viktor Richardmale18.01137012920.2125NaNS
81681703Heininen, Miss. Wendla Mariafemale23.000STON/O2. 31012907.9250NaNS
23323413Asplund, Miss. Lillian Gertrudfemale5.04234707731.3875NaNS
78578603Harmer, Mr. Abraham (David Lishin)male25.0003748877.2500NaNS
59659712Leitch, Miss. Jessie WillsfemaleNaN0024872733.0000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
23723812Collyer, Miss. Marjorie "Lottie"female8.002C.A. 3192126.2500NaNS
42342403Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)female28.01134708014.4000NaNS
80880902Meyer, Mr. Augustmale39.00024872313.0000NaNS
21721802Jacobsohn, Mr. Sidney Samuelmale42.01024384727.0000NaNS
64364413Foo, Mr. ChoongmaleNaN00160156.4958NaNS
798013Dowdell, Miss. Elizabethfemale30.00036451612.4750NaNS
959603Shorney, Mr. Charles JosephmaleNaN003749108.0500NaNS
74374403McNamee, Mr. Nealmale24.01037656616.1000NaNS
66266301Colley, Mr. Edward Pomeroymale47.000572725.5875E58S
59459502Chapman, Mr. John Henrymale37.010SC/AH 2903726.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
46746801Smart, Mr. John Montgomerymale56.00011379226.5500NaNS
21621713Honkanen, Miss. Eliinafemale27.000STON/O2. 31012837.9250NaNS
66266301Colley, Mr. Edward Pomeroymale47.000572725.5875E58S
818213Sheerlinck, Mr. Jan Baptistmale29.0003457799.5000NaNS
25025103Reed, Mr. James GeorgemaleNaN003623167.2500NaNS
16016103Cribb, Mr. John Hatfieldmale44.00137136216.1000NaNS
58058112Christy, Miss. Julie Rachelfemale25.01123778930.0000NaNS
61962002Gavey, Mr. Lawrencemale26.0003102810.5000NaNS
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.